Project: SRP201470
Aligner: STAR (2.7.6a)
Genome: For mouse, the UCSC mm10 assembly available in iGenomes was used.
Informatics tools used:
Sequencing parameters:
For each sample, the following programs were run to generate the data necessary to create this report. Written as for unstranded paired-end data. For single-end reads, R2s and insert size metrics would be omitted.
java -Xmx1024m TrimmomaticPE -phred33 [raw_sample_R1] [raw_sample_R2] [sample_R1] [sample_R1_unpaired] [sample_R2] [sample_R2_unpaired] HEADCROP:[bases to trim, if any] ILLUMINACLIP:[sample_primer_fasta]:2:30:10 MINLEN:50
fastqc [sample_R1] [sample_R2]
cat [sample_R1/R2] | awk ’((NR-2)%4==0){read=$1;total++;count[read]++}END{for(read in count){if(count[read]==1){unique++}};print total,unique,unique*100/total}’
The following STAR options were used:
STAR –genomeDir [ref_genome_index] –runThreadN 12 –outReadsUnmapped Fastx –outMultimapperOrder Random –outSAMmultNmax 1 –outFilterIntronMotifs RemoveNoncanonical –outSAMstrandField intronMotif –outSAMtype BAM SortedByCoordinate –readFilesIn [sample_R1] [sample_R2]
Using aligned output files accepted_hits.bam and unmapped.bam:
samtools sort accepted_hits.bam accepted_hits.sorted
samtools index accepted_hits.sorted.bam
samtools idxstats accepted_hits.sorted.bam > accepted_hits.sorted.stats
bamtools stats -in accepted_hits.sorted.bam > accepted_hits.sorted.bamstats
bamtools filter -in accepted_hits.sorted.bam -script cigarN.script | bamtools count
samtools view -c unmapped.bam
java -Xmx2g -jar CollectRnaSeqMetrics.jar REF_FLAT=[ref_flat file] STRAND_SPECIFICITY=NONE INPUT=accepted_hits.bam OUTPUT=RNASeqMetrics
java -Xmx2g -jar CollectInsertSizeMetrics.jar HISTOGRAM_FILE=InsertSizeHist.pdf INPUT=accepted_hits.sorted.bam OUTPUT=InsertSizeMetrics (for paired-end library)
The number of raw reads correspond to those that passed Casava QC filters, were trimmed to remove adaptors by Trimmomatic, and were aligned by STAR to ref_genome+ERCC transcripts as reported in .info files. Unique read counts were obtained by using awk on trimmed fastq files. FastQC estimates of percentage of sequences remaining after deduplication were retrieved from fastqc_data.txt files. Bamtools statistics were based on sorted and indexed bam files. The mapped reads were those that mapped to reference and were output by STAR to accepted_hits.bam. The unmapped reads were output by STAR to unmapped.bam. Some reads may be mapped to multiple locations in the genome so that the number of total reads reported by bamstats may be greater than the number of raw reads. The Junction spanning reads are computed based on accepted_hits.bam CIGAR entries containing “N.” Related text files that were saved:
SRP201470 _read_counts.txt
SRP201470 _duplicates.txt
SRP201470 _unique_counts.txt
SRP201470 _bamstats_counts.txt
Read counts are shown by per million reads.
The Picard Tools RnaSeqMetrics function computes the number of bases assigned to various classes of RNA. It also computes the coverage of bases across all transcripts (normalized to a same-sized reference). Computations are based on comparison to a refFlat file. Related text files that were saved:
SRP201470 _rnaseqmetrics_summary.txt
SRP201470 _rnaseqmetrics_hist.txt
The Picard Tools RnaSeqMetrics function computes the number of bases assigned to various classes of RNA. It also computes the coverage of bases across all transcripts (normalized to a same-sized reference). Computations are based on comparison to a refFlat file. Related text files that were saved:
SRP201470 _rnaseqmetrics_summary.txt
SRP201470 _rnaseqmetrics_hist.txt
For paired-end data, the Picard Tools CollectInsertSizeMetrics function was used to compute the distribution of insert sizes in the accepted_hits.bam file and create a histogram. Related text files that were saved:
SRP201470 _insertmetrics_summary.txt
Samtools produces a summary document that includes the number of reads mapped to each chromosome. Related text files that were saved:
SRP201470 _counts.txt
For samples that contained External RNA Controls Consortium (ERCC) Spike-Ins, dose response curves (i.e. plots of ERCC transcript FPKM vs. ERCC transcript molecules) were created. Ideally, the slope and R2 would equal 1.0.
| PC | Proportion of Variance (%) | Cumulative Proportion of Variance (%) |
|---|---|---|
| PC1 | 86.55 | 86.55 |
| PC2 | 5.472 | 92.02 |
| PC3 | 3.703 | 95.72 |
| PC4 | 1.473 | 97.19 |
| PC5 | 0.9092 | 98.1 |
| PC6 | 0.3561 | 98.46 |
| PC7 | 0.316 | 98.77 |
| PC8 | 0.2364 | 99.01 |
| PC9 | 0.1429 | 99.15 |
| PC10 | 0.1154 | 99.27 |
PCA plots are generated using the first two principle components colored by known factors (e.g. treatment/disease conditions, tissue, and donors), visualizing similarities between arrays and these similarities’ correlation to batch effects.
Numbers of reads that can not mapped to any feature (Nofeature count) are shown by per million reads from htseq-count quantification results
R version 4.0.3 (2020-10-10)
Platform: x86_64-pc-linux-gnu (64-bit)
locale: LC_CTYPE=en_US.UTF-8, LC_NUMERIC=C, LC_TIME=en_US.UTF-8, LC_COLLATE=en_US.UTF-8, LC_MONETARY=en_US.UTF-8, LC_MESSAGES=en_US.UTF-8, LC_PAPER=en_US.UTF-8, LC_NAME=C, LC_ADDRESS=C, LC_TELEPHONE=C, LC_MEASUREMENT=en_US.UTF-8 and LC_IDENTIFICATION=C
attached base packages: parallel, stats4, stats, graphics, grDevices, utils, datasets, methods and base
other attached packages: DESeq2(v.1.28.1), SummarizedExperiment(v.1.18.2), DelayedArray(v.0.14.1), matrixStats(v.0.57.0), Biobase(v.2.48.0), GenomicRanges(v.1.40.0), GenomeInfoDb(v.1.24.2), IRanges(v.2.22.2), S4Vectors(v.0.26.1), BiocGenerics(v.0.34.0), knitr(v.1.30), ggplot2(v.3.3.2), DT(v.0.16), RColorBrewer(v.1.1-2), pander(v.0.6.3), tidyr(v.1.1.2) and rmarkdown(v.2.4)
loaded via a namespace (and not attached): locfit(v.1.5-9.4), Rcpp(v.1.0.5), lattice(v.0.20-41), digest(v.0.6.25), R6(v.2.4.1), RSQLite(v.2.2.1), evaluate(v.0.14), pillar(v.1.4.6), zlibbioc(v.1.34.0), rlang(v.0.4.8), annotate(v.1.66.0), blob(v.1.2.1), Matrix(v.1.2-18), splines(v.4.0.3), labeling(v.0.3), BiocParallel(v.1.22.0), geneplotter(v.1.66.0), stringr(v.1.4.0), htmlwidgets(v.1.5.2), RCurl(v.1.98-1.2), bit(v.4.0.4), munsell(v.0.5.0), compiler(v.4.0.3), xfun(v.0.18), pkgconfig(v.2.0.3), htmltools(v.0.5.0), tidyselect(v.1.1.0), tibble(v.3.0.4), GenomeInfoDbData(v.1.2.3), XML(v.3.99-0.5), crayon(v.1.3.4), dplyr(v.1.0.2), withr(v.2.3.0), bitops(v.1.0-6), grid(v.4.0.3), jsonlite(v.1.7.1), xtable(v.1.8-4), gtable(v.0.3.0), lifecycle(v.0.2.0), DBI(v.1.1.0), magrittr(v.1.5), scales(v.1.1.1), stringi(v.1.5.3), farver(v.2.0.3), XVector(v.0.28.0), genefilter(v.1.70.0), ellipsis(v.0.3.1), generics(v.0.0.2), vctrs(v.0.3.4), tools(v.4.0.3), bit64(v.4.0.5), glue(v.1.4.2), purrr(v.0.3.4), crosstalk(v.1.1.0.1), survival(v.3.2-7), yaml(v.2.2.1), AnnotationDbi(v.1.50.3), colorspace(v.1.4-1) and memoise(v.1.1.0)